Security Level Classification of Confidential Documents Written in Turkish
نویسندگان
چکیده
This article introduces a security level classification methodology of confidential documents written in Turkish language. Internal documents of TUBITAK UEKAE, holding various security levels (unclassified-restrictedsecret) were classified within a methodology using Support Vector Machines (SVM’s) [1] and naïve bayes classifiers [3][9]. To represent term-document relations a recommended metric “TF-IDF" [2] was chosen to construct a weight matrix. Turkic languages provide a very difficult natural language processing problem in comparison with English: “Stemming”. A Turkish stemming tool "zemberek" was used to find out the features without suffix. At the end of the article some experimental results and success metrics are projected.
منابع مشابه
Security-level classification for confidential documents by using adaptive neuro-fuzzy inference systems
.................................................................................................................................. ii ÖZET ............................................................................................................................................ iv TABLE OF CONTENTS ...................................................................................................
متن کاملSymmetric Key Size for Different Level of Information Classification
Information is an important asset to an organization as well as to a nation. Incorrect handling of information may cause economic damage to an organization or cause harm to national security. Some of the information is confidential or sensitive. Confidential information can be categorized into various levels of classification. The classification depends on the level of damage to an organization...
متن کاملText Classification for Data Loss Prevention
Businesses, governments, and individuals leak confidential information, both accidentally and maliciously, at tremendous cost in money, privacy, national security, and reputation. Several security software vendors now offer “data loss prevention” (DLP) solutions that use simple algorithms, such as keyword lists and hashing, which are too coarse to capture the features what makes sensitive docum...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009